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Abstract 

We study the properties of the distance between attractors in Random Boolean 
Networks, a prominent model of genetic regulatory networks. We define three dis- 
tance measures, upon which attractor distance matrices are constructed and their 
main statistic parameters are computed. The experimental analysis shows that or- 
dered networks have a very clustered set of attractors, while chaotic networks' at- 
tractors are scattered; critical networks show, instead, a pattern with characteristics 
of both ordered and chaotic networks. 

1 Introduction 

Boolean networks (BNs) have been introduced as models of genetic regulatory net- 
works (GRNs) by Kauffman (8). Their interest as GRN models relies primarily in the 
fact that some classes of BNs statistically reproduce some characteristics of real cells. 
For example, it has been shown that single gene knock-out experiments can be sim- 
ulated in Random BNs lfT4l . Reproducing statistical properties of real cells through 
specific classes of GRNs is called the ensemble approach, that was proposed by Kauff- 
man J9)- In this approach, the objective of the modelling process is not to define a 
model for a single, specific cell, but rather to reproduce the statistics of the parameters 
of interest of the ensemble of cells of the given type. In general, for an ensemble of real 
cells it is possible to define a set of features, such as some parameter on cell dynamics 
in case of perturbation. An ensemble of GRNs is of interest if the matching between its 
features and those of real cells is high. Nowadays, modern biotechnology tools, such as 
DNA microarrays, make it possible to gather a huge amount of biological data, hence 
the ensemble approach can be applied even more effectively than in the past. 

A prominent feature that can be considered in the ensemble approach is the dis- 
tribution in distances between gene expression levels in different types of cells. It 
was conjectures by Kauffman [8| that attractors in Random BNs correspond to cellu- 
lar types, a conjecture further refined in terms of threshold ergodic sets by Serra et al. 
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in lfT2l . In order to test this conjecture, two issues have to be addressed: first, the prop- 
erties of BN attractors have to be studied; second, these properties have to be compared 
with the ones of cellular types. In this work, we aim at providing a contribution to the 
first issue by studying the statistics of distances between attractors in Random BNs. 
In Section [2] we briefly summarise the main concepts and results in the field of BNs 
with emphasis on Random BNs. We then discuss measures and features of interest for 
the attractors in BNs in Section [3] Results of an experimental analysis are presented 
and discussed in Section [4] We finally summarise this contribution and outline future 
research in Section[5] 

2 Boolean networks 

BNs have been firstly introduced by Kauffman [ 8 1 and subsequently received consid- 
erable attention in the composite community of complex systems research. Recent 
advances in this field can be mainly found in works addressing themes in genetic reg- 
ulatory networks or investigating properties of BNs themselves ifTl l4l [TOl [131 . 

A BN is a discrete-state and discrete-time dynamical system defined by a directed 
graph of n nodes, each associated to a Boolean variable Xi, i = 1, . . . , n, and a Boolean 
function fi(Xi ± , . . . , Xi k ), where fc, is the number of inputs of node i. Often, fej is 
chosen to be equal to a constant value k for every i. The arguments of the Boolean 
function /j are the nodes whose outgoing arcs are connected to node i. The state of the 
system at time t, t € N, is defined by the array of the n Boolean variable values at time 
t: s(t) = (xi(t), . . . , x n (t)). The most studied dynamics for BNs is synchronous, i.e., 
nodes update their states in parallel, and deterministic. However, many variants exists, 
including asynchronous and probabilistic update rules [5 |. 

In this work, we consider networks ruled by synchronous and deterministic dy- 
namics. Given this setting, the network trajectory in the n-dimensional state space is 
a sequence of states composing a transient, possibly empty, followed by an attractor, 
that is a cycle of length I e [1, . . . , 2™]. When BNs are employed as genetic regulatory 
network models, attractors assume a notable relevance as they can be interpreted as 
cellular types (5J. This interpretation has recently been extended by considering sets 
of attractors, the so-called Threshold Ergodic Sets (TESg), instead of single attrac- 
tors JT2]- This extension provides support to the usefulness of RBNs as GRN models, 
as it makes it possible also to model cell differentiation dynamics. 

A special category of BNs that has received particular attention is that of Random 
BNs, which can capture relevant phenomena in genetic and cellular mechanisms and 
complex systems in general. Random BNs (RBNs) are usually generated by choosing 
at random k inputs per node and by defining the Boolean functions by assigning to 
each entry of the truth tables a 1 with probability p and a with probability 1 — p. 
Parameter p is called homogeneity or bias. Depending on the values of k and p the 
dynamics of RBNs is ordered or chaotic. In the first case, the majority of nodes in 
the attractor is frozen and any moderate-size perturbation is rapidly dampened and the 
network returns to its original attractor. Conversely, in chaotic dynamics, attractor cy- 
cles are very long and the system is extremely sensitive to small perturbations: slightly 
different initial states lead to divergent trajectories in the state space. RBNs temporal 
evolution undergo a second order phase transition between order and chaos, governed 
by the following relation between k and p: k c — [2p c (l — p c )] _1 , where the subscript 
c denotes the critical values Q. Networks along the critical line have important prop- 
erties, such as the capability of achieving the best balance between evolvability and 
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robustness [ 1 1 and maximising the average mutual information among nodes 1 10 1. 

3 Attractor distance statistics 

In real cells, each type is characterised by a specific pattern of gene expression levels 
which can be represented as real number vectors of size n, where n is the number 
of genes. On the model side, we can make the hypothesis that each attractor of a 
BN represents a cell type. Statistics and, possibly, other kinds of information on the 
distances between attractors can be computed and then compared against equivalent 
statistics on gene expression levels in real cell types, so as to test to what extent the 
class of RBNs capture relevant properties of ensembles of real cellsQ 

In this Section, we introduce the distance measures we defined over the attractors, 
along with the statistics and properties we analysed. 

3.1 Attractor distance measures 

We defined and studied three different distances among BN attractors, namely min- 
Hamming, Euclidean and pseudo-Hamming. The first one is defined upon the states 
composing the attractors, while the other two are defined upon the average values of 
BN nodes in each attractor. 

min-Hamming. This distance measures the minimum number of node values that 
should be changed in order to let the network's trajectory jump from an attractor di- 
rectly to another one. Formally: 



where Hd(s, s') is the Hamming distance between states s and s'. It is important to 
observe that this distance does not depend on the network dynamics in the state space, 
as it simply considers the Hamming distance between states independently of the state 
space trajectory. Measures which depend on the actual state space topology can be also 
defined. 

The following distances are defined over real vectors V{Ai) — (v%, . . . , v n ), each 
one computed for a given attractor Ai . Elements Vj (j = 1 , . . . , n) are computed by 
averaging the values assumed by variable Xj along the attractor, i.e., by computing the 
fraction of times a Boolean variable assumes value 1 along the attractor. In formulas: 
given attractor A = . . . , s^) of period r, with = (x[ h \ . . . , a;„ ), h = 

1, . . . , r, each element vj of vector V(A) is computed as: Vj — ~ J2h=i x j ■ ^ nas 
to be noted that this mapping between attractors and vectors of real numbers makes it 
possible to establish a simple yet direct semantics of a BN attractor as a gene expression 
level array lf]~3l . 

Euclidean. A straightforward way of measuring the distance between two real val- 
ued vectors is to compute their Euclidean distance. This distance induces naturally a 
distance over attractors: 



The hypothesis of the correspondence between attractors and cell types is therefore operational, rather 
than ontological. 



dmH \A% , Aj ) 



mm{H d (s, s') : s E Ai, s' E Aj)} 
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pseudo-Hamming. The Euclidean distance might smooth the differences between 
expression vectors, thus making it hard to distinguish between attractors of different 
length. In fact, attractor cycles of very different length might be mapped onto real 
valued vectors whose Euclidean distance is very small. For this reason, we introduced 
a distance that is computed by summing up the number of homologous vector entries 
which are different. In formulas: 

a 

d^ H (Ai,Aj) =d^ H (Vi,Vj) = y^l - 6(vu -Vji) , where S(x, y) = 

1=1 



f 1 , if x = y 
1 , otherwise 



3.2 Attractors clustering 

Given the attractors of a BN network, a distance matrix can be constructed according 
to the distances previously defined. Besides computing the main statistical parame- 
ters of such data, distance matrices have been also used in two kinds of analysis: (i) 
distribution in (weighted) clustering coefficient and (//) attractor dendrogramsP] 



3.2.1 Clustering Coefficient 

The clustering coefficient Ci of a node i in a graph provides an estimation of the how 
much its neighbours tend to form a complete graph. For a non-weighted graph, the 
clustering coefficient Ci is equal to its maximum value 1 if neighbours of i form a 
complete graph, while it is if neighbours of i are disconnected. The average of node 
clustering coefficient provides an estimation of how much a graph is characterised by 
clusters of nodes. Formally, a network clustering coefficient is: 

where n.j is the number of edges between neighbours of node i and gi the maximum 
possible number of edges between them. It is also possible to extend the clustering 
coefficient definition to weighted graphs 1 15 1; in this case, the greater the edge weight, 
the stronger is the intensity of the connection between the two nodes. Values used for 
computing this measure are taken from a network adjacency matrix A — [flij), where 
an element a.- L j corresponds to the weight of the edge which has its tail in % and its head 
in j; ctij = if i = j or edge (i, j) is not present. In formulas: 

In our analysis, the weight of an edge that connects two nodes (attractors) is the (nor- 
malised) reciprocal of the distance between the attractors. 



3.2.2 Dendrograms 

A network attractor distance matrix can be also used to graphically represent clustered 
distribution of attractors. For each network, a dendrogram has been generated, which 
represents in a single data structure all the possible clusters of the elements in a set. 
Attractor dendrogram analysis yields a graphical representation of the tendency of the 
attractors to gather into clusters. The results we present are based on dendrograms 
constructed using 'single-link' algorithm Q. 

Preliminary results have been published in II 11 . 



4 



Table 1 : Distance statistics 



Minimum Hamming distance 



Bias 


Min. 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max. 


0.5 


1 


5 


9 


9.03 


12 


29 


0.788675 


1 


1 


4 


4.62 


7 


24 


0.85 


1 


1 


2 


3.86 


5 


18 



Euclidean distance between activation vectors 


Bias 


Min. 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max. 


0.5 


0.00 


0.43 


0.90 


1.23 


1.80 


4.77 


0.788675 


0.00 


0.36 


1.23 


1.18 


1.74 


4.69 


0.85 


0.00 


0.67 


1.16 


1.36 


1.88 


4.24 



Pseudo-Hamming distance between activation vectors 


Bias 


Min. 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max. 


0.5 





66 


68 


63.44 


70 


70 


0.788675 





3 


12 


13.61 


22 


51 


0.85 





3 


8 


8.35 


10 


27 



4 Experimental analysis 

In this Section, we present the results of the experimental analysis performed by simu- 
lating BNs with 'The Boolean Network Toolkit' |2). We analysed the main statistical 
parameters of the distances between attractors in RBNs with 70 nodes|^]fc = 3 and bias 
values such that the networks are in ordered (p = 0.85), chaotic (p — 0.5) and crit- 
ical (p — 0.788675) phases. For each parameter configuration, 50 independent RBN 
realisations have been generated. Each network dynamics has been simulated for at 
most 10 6 steps, starting from 10 5 initial states picked uniformly at random in order to 
sample attractor cycles. 

4.1 Distance measures statistics 

A first analysis concerns the main statistical parameters of the distance matrices. Ta- 
ble [T] shows the minimum, maximum, mean, median and 1st and 3rd quartile values of 
such quantities. We can observe that the maximal distances, independently of the actual 
definition used, are in chaotic networks. Moreover, also the mean and median values 
of attractor distance in chaotic networks are considerably higher than those of critical 
and ordered networks. The differences between the last two classes are smaller than 
those with respect to chaotic ones, even though critical networks show a larger spread 
in values and higher average values]^] It is remarkable to observe that the qualitative 
pattern is the same, independently of the distance measure. 
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4.2 Attractors clustering 

In Figures [TJ a), [TJb) andJTJc), the histogram of the average clustering coefficient distri- 
bution is plotted for chaotic, critical and ordered BNs, respectively. The distance mea- 
sure considered is the min-Hamming, but qualitatively analogous results have obtained 
also with the other distance measures. The pattern emerging from the histograms is not 
surprising: chaotic network attractors have a very low tendency of forming clusters, 
while in critical and ordered networks, attractors are clearly clustered. It is interesting 
to note that critical networks seem to exhibit a pattern that is a mixture of the chaotic 
and ordered ones, because the clustering coefficient distribution spans, with signifi- 
cant values, across the whole range. A similar picture emerges from the dendrograms, 
which graphically capture the clusters emerging among attractors. In Figures |2ja), 
|2|b) and|2|c), typical cases of dendrograms for chaotic, critical and ordered BNs are 
respectively plotted. 

5 Conclusion and future work 

In this work, we have studied some relevant statistical features of the distances among 
attractors in RBNs. We observed that chaotic networks have a scattered attractors set, 
while ordered networks' attractors show a strong tendency to form a cluster; critical 
networks exhibit a pattern that is a mixture of the two previous cases. This contribution 
is a first step towards the study of attractor sets and landscapes in RBNs with the aim 
of testing whether this GRN model is suitable for reproducing features of real cells. A 
further step will be the comparison against data of expression levels of genes in real 
cells. 
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Figure 1 : Average clustering coefficient distribution 
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Figure 2: Typical samples of attractor dendrograms. 
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